Skip to content

Add default target_modules for nemotron_h hybrid Mamba-MoE models#3289

Open
A1c0r-Z wants to merge 1 commit into
huggingface:mainfrom
A1c0r-Z:add-nemotron-h-default-target-modules
Open

Add default target_modules for nemotron_h hybrid Mamba-MoE models#3289
A1c0r-Z wants to merge 1 commit into
huggingface:mainfrom
A1c0r-Z:add-nemotron-h-default-target-modules

Conversation

@A1c0r-Z

@A1c0r-Z A1c0r-Z commented May 29, 2026

Copy link
Copy Markdown

Add default target_modules for Nemotron-H

What this does

Registers default target_modules for the nemotron_h model type so that LoraConfig() (and other PEFT methods) can be used on NVIDIA's Nemotron-3 hybrid Mamba + MoE + Attention models without users having to specify target_modules manually.

Defaults target the attention projections only (q_proj, k_proj, v_proj, o_proj). The Mamba mixer's in_proj / out_proj / conv1d are intentionally excluded because they belong to the Mamba block and are forbidden by the Mamba-architecture compatibility check added in #2562. nemotron_h is also added to mamba_model_types so the compatibility check applies to it.

Why

Without these defaults, calling LoraConfig() on a Nemotron-H model raises Please specify target_modules. NVIDIA's own NeMo Automodel LoRA cookbook for Nemotron-3 works around this by passing exclude_modules=["*.out_proj"] and bypasses PEFT for the inner LoRA application — extending PEFT to handle nemotron_h natively fixes the friction upstream.

What changed

src/peft/utils/constants.py — adds "nemotron_h" to four mappings, following the precedent established in #3136 (gemma4):

  • TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
  • TRANSFORMERS_MODELS_TO_ADALORA_TARGET_MODULES_MAPPING
  • TRANSFORMERS_MODELS_TO_VBLORA_TARGET_MODULES_MAPPING
  • TRANSFORMERS_MODELS_TO_WAVEFT_TARGET_MODULES_MAPPING

src/peft/tuners/tuners_utils.py — adds "nemotron_h" to mamba_model_types in _check_lora_target_modules_mamba so the Mamba forbidden-module check (out_proj, conv1d) applies. This protects users who might explicitly try to target those names without realizing they belong to the Mamba mixer.

tests/test_custom_models.py — adds two tests under TestDefaultTargetModules:

  1. test_default_target_modules_nemotron_h: verifies the constant lookup for all four mappings and asserts the defaults do not include forbidden Mamba modules.
  2. test_nemotron_h_blocks_mamba_modules: verifies the Mamba compatibility check raises when a user explicitly targets out_proj on a model whose model_type is "nemotron_h".

Verified

Local syntax check passes (ast.parse on all three files). Test suite run pending environment install.

Related

Registers "nemotron_h" in TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
(and ADALORA / VBLORA / WAVEFT) with defaults [q_proj, k_proj, v_proj, o_proj],
so users can call LoraConfig() on Nemotron-3 without specifying target_modules.

Defaults intentionally exclude the Mamba mixer's in_proj / out_proj / conv1d,
which are blocked by the compatibility check added in huggingface#2562. nemotron_h is
also added to mamba_model_types so that check applies.

Adds two unit tests under TestDefaultTargetModules:
- verifies the constants for all 4 mappings
- verifies the Mamba check fires for nemotron_h + out_proj

Follows the gemma4 precedent set in huggingface#3136.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant